AITopics | optimal schedule

Collaborating Authors

optimal schedule

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

af94ed0d6f5acc95f97170e3685f16c0-Paper.pdf

Neural Information Processing SystemsFeb-19-2026, 06:21:18 GMT

algorithm, online algorithm, prediction, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland (0.05)
South America > Brazil (0.04)
(7 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Bordelon, Blake, Mori, Francesco

arXiv.org Machine LearningFeb-5-2026

Setting the learning rate for a deep learning model is a critical part of successful training, yet choosing this hyperparameter is often done empirically with trial and error. In this work, we explore a solvable model of optimal learning rate schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_T^\star(t)$ where $t$ is the current iterate and $T$ is the total training horizon. This schedule is computed both numerically and analytically (when possible) using optimal control methods. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_T^\star(t) \simeq T^{-ξ} (1-t/T)^δ$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant (in $T$) initial learning rate and annealing performed over a vanishing (in $T$) fraction of training steps. We investigate joint optimization of learning rate and batch size, identifying a degenerate optimality condition. Our model also predicts the compute-optimal scaling laws (where model size and training steps are chosen optimally) in both easy and hard regimes. Going beyond SGD, we consider optimal schedules for the momentum $β(t)$, where speedups in the hard phase are possible. We compare our optimal schedule to various benchmarks in our task including (1) optimal constant learning rates $η_T(t) \sim T^{-ξ}$ (2) optimal power laws $η_T(t) \sim T^{-ξ} t^{-χ}$, finding that our schedule achieves better rates than either of these. Our theory suggests that learning rate transfer across training horizon depends on the structure of the model and task. We explore these ideas in simple experimental pretraining setups.

artificial intelligence, machine learning, optimal learning rate schedule, (11 more...)

arXiv.org Machine Learning

2602.04774

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Error Bounds and Optimal Schedules for Masked Diffusions with Factorized Approximations

Lavenant, Hugo, Zanella, Giacomo

arXiv.org Machine LearningOct-30-2025

Recently proposed generative models for discrete data, such as Masked Diffusion Models (MDMs), exploit conditional independence approximations to reduce the computational cost of popular Auto-Regressive Models (ARMs), at the price of some bias in the sampling distribution. We study the resulting computation-vs-accuracy trade-off, providing general error bounds (in relative entropy) that depend only on the average number of tokens generated per iteration and are independent of the data dimensionality (i.e. sequence length), thus supporting the empirical success of MDMs. We then investigate the gain obtained by using non-constant schedule sizes (i.e. varying the number of unmasked tokens during the generation process) and identify the optimal schedule as a function of a so-called information profile of the data distribution, thus allowing for a principled optimization of schedule sizes. We define methods directly as sampling algorithms and do not use classical derivations as time-reversed diffusion processes, leading us to simple and transparent proofs.

artificial intelligence, information profile, machine learning, (15 more...)

arXiv.org Machine Learning

2510.25544

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Argumentation for Explainable Workforce Optimisation (with Appendix)

Leigh, Jennifer, Letsios, Dimitrios, Mella, Alessandro, Machetti, Lucio, Toni, Francesca

arXiv.org Artificial IntelligenceSep-23-2025

Workforce management is a complex problem involving the optimisation of the makespan and travel distance required for a team of operators to complete a set of jobs, using a set of instruments. A crucial challenge in workforce management is accommodating changes at execution time so that explanations are provided to all stakeholders involved. Here, we show that, by understanding workforce management as abstract argumentation in an industrial application, we can accommodate change and obtain faithful explanations. We show, with a user study, that our tool and explanations lead to faster and more accurate problem solving than conventional manual approaches.

artificial intelligence, natural language, planning & scheduling, (17 more...)

arXiv.org Artificial Intelligence

2508.15118

Country: Europe (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.91)

Add feedback

Learning Augmented Energy Minimization via Speed Scaling

Neural Information Processing SystemsAug-15-2025, 20:00:49 GMT

We initiate the study of a variant of the classic online speed scaling problem, in which machine learning predictions about the future can be integrated naturally.

algorithm, online algorithm, prediction, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland (0.04)
South America > Brazil (0.04)
(7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

af94ed0d6f5acc95f97170e3685f16c0-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 20:00:43 GMT

algorithm, online algorithm, prediction, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
Europe > Switzerland (0.05)
South America > Brazil (0.04)
(9 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

The Cosine Schedule is Fisher-Rao-Optimal for Masked Discrete Diffusion Models

Zhang, Leo

arXiv.org Machine LearningAug-8-2025

In this work, we study the problem of choosing the discretisation schedule for sampling from masked discrete diffusion models in terms of the information geometry of the induced probability path. Specifically, we show that the optimal schedule under the Fisher-Rao geometry recovers the popularly-used cosine schedule.

artificial intelligence, diffusion model, machine learning, (13 more...)

arXiv.org Machine Learning

2508.04884

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A statistical physics framework for optimal learning

Mignacco, Francesca, Mori, Francesco

arXiv.org Artificial IntelligenceJul-11-2025

Learning is a complex dynamical process shaped by a range of interconnected decisions. Careful design of hyperparameter schedules for artificial neural networks or efficient allocation of cognitive resources by biological learners can dramatically affect performance. Yet, theoretical understanding of optimal learning strategies remains sparse, especially due to the intricate interplay between evolving meta-parameters and nonlinear learning dynamics. The search for optimal protocols is further hindered by the high dimensionality of the learning space, often resulting in predominantly heuristic, difficult to interpret, and computationally demanding solutions. Here, we combine statistical physics with control theory in a unified theoretical framework to identify optimal protocols in prototypical neural network models. In the high-dimensional limit, we derive closed-form ordinary differential equations that track online stochastic gradient descent through low-dimensional order parameters. We formulate the design of learning protocols as an optimal control problem directly on the dynamics of the order parameters with the goal of minimizing the generalization error at the end of training. This framework encompasses a variety of learning scenarios, optimization constraints, and control budgets. We apply it to representative cases, including optimal curricula, adaptive dropout regularization and noise schedules in denoising autoencoders. We find nontrivial yet interpretable strategies highlighting how optimal protocols mediate crucial learning tradeoffs, such as maximizing alignment with informative input directions while minimizing noise fitting. Finally, we show how to apply our framework to real datasets. Our results establish a principled foundation for understanding and designing optimal learning protocols and suggest a path toward a theory of meta-learning grounded in statistical physics.

artificial intelligence, international conference, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.07907

Country:

Europe (0.92)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Optimal Scheduling of Dynamic Transport

Tsimpos, Panos, Ren, Zhi, Zech, Jakob, Marzouk, Youssef

arXiv.org Machine LearningApr-19-2025

Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable design freedom, and a central question is how to exploit this freedom. Though many popular methods seek straight line (i.e., zero acceleration) trajectories, we show here that a specific class of ``curved'' trajectories can significantly improve approximation and learning. In particular, we consider the unit-time interpolation of any given transport map $T$ and seek the schedule $\tau: [0,1] \to [0,1]$ that minimizes the spatial Lipschitz constant of the corresponding velocity field over all times $t \in [0,1]$. This quantity is crucial as it allows for control of the approximation error when the velocity field is learned from data. We show that, for a broad class of source/target measures and transport maps $T$, the \emph{optimal schedule} can be computed in closed form, and that the resulting optimal Lipschitz constant is \emph{exponentially smaller} than that induced by an identity schedule (corresponding to, for instance, the Wasserstein geodesic). Our proof technique relies on the calculus of variations and $\Gamma$-convergence, allowing us to approximate the aforementioned degenerate objective by a family of smooth, tractable problems.

artificial intelligence, equation, machine learning, (17 more...)

arXiv.org Machine Learning

2504.14425

Country: